2 samples are read and processed, which are:
| Input data path |
|---|
| D:/data/bm_st/data_all/ST10_DHL/outs |
| D:/data/bm_st/data_all/ST5_DHL/outs |
The number of features, the number of counts and the percent of mitochondrial genes in 2 samples are shown below.
The following arguments are used to filter data.
| Arguments | value |     Meaning     |
|---|---|---|
min.cells |
10 | Include features detected in at least this many cells |
min.feature |
200 | Include cells where at least this many features are detected |
percent.mt.limit |
20 | Include cells where at most this percent of mitochondrial genes are detected |
Normalization, dimensionality reduction and clustering are performed according to the standard process of Seurat. The results of clustering are shown bellow.
The following arguments are used in the step above.
| Arguments | value |     Meaning     |
|---|---|---|
scale.factor |
10000 | Sets the scale factor for cell-level normalization |
vars.to.regress |
Variables to regress out | |
ndims |
50 | Total Number of PCs to compute |
PCs |
1:35 | Which dimensions of PCs to use in FindNeighbors, RunTSNE and RunUMAP functions of Seurat |
n.neighbors |
50 | The number of neighboring points used in RunUMAP and FindNeighbors functions |
resolution |
0.4 | The argument resolution of FindClusters function in Seurat |
We used abcCellmap to annotate the cell types. Scmap and Seurat are used to achieve the prediction. The predicted cell types are shown bellow.
In addition, mapping data to a reference dataset can identify shared cell states that are present across different datasets. We provided a reference dataset containing 1354 cells with 10 labels. FindTransferAnchors function are used to integrate the query data and the reference data. The predicted labels of the query data shown below are determined by TransferData function.
The relevant argument settings are as follows.
| Arguments | value |     Meaning     |
|---|---|---|
PCs |
1:35 | The argument dims of FindTransferAnchors and TransferData functions in Seurat |
To facilitate data visualization, we used phateR, umap and tsne for dimensionality reduction. The visualizations are shown bellow.
Differentially expressed genes are identified using FindAllMarkers function. All markers found are stored in ../Step6.Find_DEGs/sc_object.markerGenes.csv. The first 5 markers of each cluster are shown below.
The relevant argument settings are as follows.
| Arguments | value |     Meaning     |
|---|---|---|
min.pct |
0.25 | The argument min.pct of FindAllMarkers function in Seurat |
logfc.threshold |
0.25 | The argument logfc.threshold of FindAllMarkers function in Seurat |
The cell cycle phases of each cell are classified using cyclone function. The scores of G1 and G2M phases are defined as the average expression of cell cycle genes from Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. The proportion of cell cycle phases in each cluster is shown as follows.
The distribution of G1G2 score is also shown as follows.
Cell heterogeneity within each cluster is reflected by the distribution of the Spearman correlation coefficients of gene expression of every pair of cells. The boxplot below shows the distribution of Spearman correlation coefficients for cluster.
The marker genes of following lineages are visualized using violin diagrams.
We calculate lineage scores for specified gene sets based on the provided expression data. Four lineages (HSPC, myeloid, B cell, T/NK) are considered. The signatures of these four lineages are shown as follows.
| Lineage | Marker genes |
|---|---|
| HSPC lineage | CD34, KIT, AVP, FLT3, MME, CD7, CD38, CSF1R, FCGR1A, MPO, ELANE, IL3RA |
| Myeloid lineage | LYZ, CD36, MPO, FCGR1A, CD4, CD14, CD300E, ITGAX, FCGR3A, FLT3, AXL, SIGLEC6, CLEC4C, IRF4, LILRA4, IL3RA, IRF8, IRF7, XCR1, CD1C, THBD, MRC1, CD34, KIT, ITGA2B, PF4, CD9, ENG, KLF, TFRC |
| B cell lineage | CD79A, IGLL1, RAG1, RAG2, VPREB1, MME, IL7R, DNTT, MKI67, PCNA, TCL1A, MS4A1, IGHD, CD27, IGHG3 |
| T NK cell lineage | CD3D, CD3E, CD8A, CCR7, IL7R, SELL, KLRG1, CD27, GNLY, NKG7, PDCD1, TNFRSF9, LAG3, CD160, CD4, CD40LG, IL2RA, FOXP3, DUSP4, IL2RB, KLRF1, FCGR3A, NCAM1, XCL1, MKI67, PCNA, KLRF |
The generated heatmaps of lineage scores and gene expression patterns are as follows.
The corresponding results are stored in lineage_signatures_scores.csv
Gene Set Variation Analysis (GSVA) evaluates the enrichment of gene sets in each cluster to infer the function of them. GSVA package is used to perform GSVA on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene set. The pathways with an adjusted P-value less than 0.05 are shown in the heatmap.
Trajectory analysis can help infer the differentiation process between hematopoietic cells at the single-cell level. In order to obtain reliable results, three methods, monocle2, slingshot and scVelo, are used for trajectory analysis.
The data is analyzed based on the monocle2 tutorial. DDRTree algorithm is used for dimensionality reduction. Clustering results of Seurat and states obtained from the orderCell function are presented along the minimum spanning tree as follows.
The data is also analyzed based on the slingshot tutorial. The smooth curves modeling development along various lineages are shown in the first two dimensions of principal component space.
The data is then analyzed based on the scVelo tutorial. The scatter plot and stream plot of inferred velocities are shown bellow.
Transcription factors (TFs) regulate the amount of messenger RNA (mRNA) produced by the gene. TF analysis is performed using SCENIC. The output files and reports of SCENIC are located in ../Step13.TF_analysis/int. The inferred expressions of TFs in each cluster are shown in the heatmap.
Signal crosstalk between cells is crucial for cellular state and behavior. CellChat is used to infer and analyze the cell-cell communication based on the tutorial. The visualization of each cell-cell communication network is under ../Step15.Cell_cell_interection. The circle plot of overall interaction information is shown as follows.
The incoming and outgoing signaling patterns are also shown as follows.
The outputs includes: